Reproducible data collection

Aud Halbritter

Plant Functional Trait Courses

Design spreadsheets

Spreadsheet content

  • Date, time
  • Location: region/site
  • Experimental design: transect, block, plot, treatments, replicate
  • Organism: species/population/genet
  • Unique ID for sample/observation
  • Response
  • Predictors
  • METADATA: recorder/scribe, weather, notes

Design spreadsheet - paper

Design spreadsheet - digital

Design spreadsheet - data validation

Design spreadsheet - data validation

Digitizing spreadsheets

Exercise

Discuss in pairs: are these good spreadsheets?

Exercise

Discuss with your neighbour: are these good spreadsheets?

Tidy data

Long or wide format

Consistency in datasheets

Consistency - meaningful names

Consistency - meaningful names

Exercise

Discuss in pairs: which of these are meaningful variable names?

  • T

  • bird_raw

  • jja1b

  • mean

  • data

  • ddd

Consistency - meaningful names

  • A name can contain letters, numbers, dot and underscore

  • First letter must be letter or dot. If the first character in a name is a dot, the object is invisible.

  • Avoid special characters (e.g. æ, å, ø, ö)

  • Avoid reserved names: function, TRUE, mean, etc.

Consistency - style

“Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread.”

Consistent style with styler

library(styler)
 
style_text("my_function<-function(my_data){
  my_data|> group_by(group)|> 
    summarise(mean=mean(variable),se=sd(variable)/sqrt(n()))}")  
my_function <- function(my_data) {
  my_data |>
    group_by(group) |>
    summarise(mean = mean(variable), se = sd(variable) / sqrt(n()))
}

Useful package for style

Consistency - standards

Use global data standards (ISO) when available for dates: yyyy-mm-dd

Questions?